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ABSTRACT 


Nowadays, coaches and sport analyst are concerning about sport 
performance analysis through sport video match. However, they still used 
conventional method which is through manual observation of the full video 
that is very troublesome because they might miss some meaningful 
information presence in the video. Several previous studies have discussed 


about tracking ball movements, identification of player based on jersey color 

and number as well as player movement detection in various type of sport 
Keywords: such as soccer and volleyball but not in badminton. Therefore, this study 
focused on developing an automated system using Faster Region 
Convolutional Neural Network (Faster R-CNN) to track the position of the 
badminton player from the sport broadcast video. In preparing the dataset for 
training and testing, several broadcast videos were converted into image 
frames before labelling the region which indicate the players. After that, 
several different trained Faster R-CNN detectors were produced from the 
dataset before tested with different set of videos to evaluate the detector 
performance. In evaluating the performance of each detector model, 
the average precision was obtained from precision recall graph. As a result, 
this study revealed that the detector successfully detects the player when the 
detector is being fed with more generalized dataset. 
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1, INTRODUCTION 

Video is one of the unstructured data [1] that contains meaningful understanding that only can be 
obtain through analyzing. Video sport analysis get more attention nowadays because by analyzing the 
previous sport game, it inspires the coaches to rearrange the players’ position or enhance the performance of 
the players. The task of analyzing sport can be generalized into movement classification [2-3], player 
detection [4] and activity classification [5]. Several previous works proposed the identification and tracking 
of players in sport video [6] while other discovered the content characterization in sport program by applying 
several techniques on extracted low-level features such as audio, video and text captions [7] as method in 
sport video analysis. Most of the researches in sport video analysis focus on three issues which are shot 
classification, highlight extraction and ball and player tracking [8-9]. Many analysts urge to use video as 
medium in their observation and summarize the tactical analysis of the athlete. However, the conventional 
method which is manual video notation is very time consuming [10-11]. 
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In analyzing a video, a large capacity is required to compute a large dataset. Deep learning method 
has been said to be the most promising method compared to the other conventional machine learning method 
[12]. It is a subfield of supervised learning [13] and a well-known method compared to the last technique 
which is shallow learning. With the capability to have large network as it consists of three components which 
are input, output and two or more of hidden layers, it becomes preferable method used in many research 
[10, 14]. Then deep learning goes through evolution process and produce a model known as Convolutional 
Neural Network (CNN). It was derived from Artificial Neural Network (ANN), but differ from ANN as the 
present of convolution operation in convolutional layer [15]. R-CNN born after CNN brings new technique in 
image processing which it introduced region of 

interest (ROI) instead of full image [16]. This model has successfully fulfilled the demands of less 
training time [17]. Consequently, Salvador et al. take initiative to combine these two models and produce 
Fast R-CNN [16]. This approach minimizes the training time because the network identifies ROI after 
convolution. The latest evolution was Faster R-CNN where some modification has been done from the 
previous model where Faster R-CNN has an ability to extracts both image and region feature efficiently [16]. 
Nowadays, CNN has been used widely in analyzing task since the computational cost can be reduced as well 
as it gives result near real-time [18-19]. 

As mentioned before, Faster R-CNN model is one of the deep learning method which success in 
detecting object that invariant to rotation, pose, scale and illumination condition. There are several work that 
have been done in analyzing the sport video content such as soccer [6], basketball and volleyball [8]. 
However, to our knowledge, the detection of players based on faster R-CNN in broadcast video of badminton 
match has not been studied yet. Therefore, the badminton player detection from broadcast video using faster 
R-CNN was introduced in this study. The proposed algorithm will identify the badminton player in 
badminton match which could provide information about the movement of player and facilitate the coaches 
in monitoring the current performance and directly boost the capabilities of player. 


2. RESEARCH MATERIAL AND METHOD 
2.1. Research Material 

Focusing on the player detection that 1s one of the important component in interpreting the sport 
video, in this study, the broadcast video of badminton match was obtained from the YouTube database and 
the software called Virtual dub was used to extract the sequence of image frames from the obtained video. 
MATLAB 2017a is used in developing the faster R-CNN deep learning algorithm for both training and 
testing. 


2.2. Research Flowchart 
Figure 1 shows the flow of the research for a better understanding. Each sub process is presented in 
Section 2.21 to Section 2.2. 
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Figure 1. Flowchart for developing Faster R-CNN 


2.2.1 Video Selection 

There are many broadcast sport videos available online. Medium such as YouTube, blogs and 
official sport webpage enriched with broadcast video from previous game to the latest one. In this study, 
the application of Faster R-CNN on the badminton broadcast video was executed using three videos of 
badminton match: 1) Men Single Final Badminton Asia Championship 2017 Figure 2; 2) All England 
Badminton Tournament Men's Single Final 2011 Figure 3; and 3) Men's Badminton Doubles Gold Medal 
Olympic 2012 Figure 4. Each video has resolution of 360 resolutions. 
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Figure 2. Example of image frame from video 1 Figure 3. Example of image frame from video 2 





./ 


Figure 4. Example of image frame from video 3 


2.2.2 Image Labelling as Dataset Preparation 

Analyzing the continuous video is a tough task as it required specific software that compatible to 
treat video as input. As an alternative, the extraction of image frames from video has been done by using 
software called Virtual dub. This software provides variety of choices for user to choose either to extract the 
image of full video or make some selection of interested scene in the video. 

Only 100 image frames from each video were selected for labelling session. By using Training 
Image Labeler in MATLAB Application, the badminton players are being labelled with a square box 
regardless the referee and spectators as shown in Figure 5-7. The significance of this step is to build a dataset 
of video frames which will be used for the training and testing. In evaluating the effectiveness of Faster R- 
CNN with different condition, the data of each video was arranged according to several cases which later will 
be described in the next section. 





Figure 5. Labelled image from broadcast video 1 


2.2.3 Preparing The Detector Model 
After the dataset is prepared, it 1s used to feed the proposed R-CNN during the training process in 
producing the trained R-CNN detector model. In training session, six different type of trained models were 
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created and then the trained detector is being tested with the testing dataset according to six different cases as 
shown in the following Table 1. 





Figure 6. Labelled image from broadcast video 2 Figure 7. Labelled image from broadcast video 3 


Table 1. List of Model and Description 


Case Type of trained R-CNN detector model Testing Dataset 
1 Model 1: R-CNN was trained with video 1 Video 1 (a single match) 
2 Model 2: R-CNN was trained with video 2 Video 2 (a single match) 
3 Model 3: R-CNN was trained with combination of video | and 2 Combination of video | and 2 (both 
single matches) 
4 Model 4: R-CNN was trained with video 3 Video 3 (Double match) 
5 Model 5: R-CNN was trained with combination of video | and 3 Combination Single Match and Double Match 
6 Model 6: R-CNN was trained with combination of video 1, 2 and 3 Combination both Single Matches and Double 
Match 


2.2.4 Analyze the Performance for Each Model 

After that, the aforementioned trained models were tested with several testing videos combination 
according to Table | in evaluating the capability of the detector in tracking the position of the player. After 
the testing process is done, the image frame with square box indicate the score how much confident the 
detector able to detect the player can be produced. In analyzing the performance of the detector in each case, 
the precision recall graphs were generated before the average precision were calculated. 


3. RESULTS AND DISCUSSION 

Precision recall (PR) graph was evaluated to demonstrate the performance of each detector. The 
average precision obtained from PR graph is analyzed to determine which model has the best performance 
where it able to spot the badminton player excellently in every testing videos. Figure 8 to Figure 10 indicate 
the score pointing to the badminton players. In general, Faster R-CNN successively able to detect the player 
when it was tested with the same video data that was used to train the detector. Figure 11 demonstrate the 
performance of detector for all six cases stated in Table 1. Apparently, detector capable to detect player better 
when it has been trained and tested on the same video compared to when it trained and test on different 
video. Furthermore, detector can trace player perfectly in most of the video when it was trained with more 
generalized video data. From the graph in Figure 11, it is obviously seems that detector in Case 1, Case 2 and 
Case 4 produce high precision when training and testing with same video dataset compared to testing on 
different video. For instance, the detector model in Case | manage to get high average precision when it is 
tested with video | which is the same video used to train the detector while produce lower average precision 
when it is tested with other video combination. This is similar to other cases such as Case 2 and Case 4. 

Besides, the performance of Faster R-CNN improving when it trained with combination between 
different set of videos. As demonstrate in Figure 11, the performance of trained detector for Case 3, Case 5 
and Case 6 have been improved (when it is tested with almost on all video combination) compare to the 
previous Case 1, Case 2 and Case 4. Faster R-CNN has best performance in detecting player when it has been 
trained with combination of all videos as proved in Case 6. This is because Faster R-CNN has variety of data 
with different condition and more generalized information. From Figure 11, it can be seen that the trained 
detector capable to detect the player perfectly almost in all condition since it has learned variety of features. 
Hence, detector require many data including varies color, texture and condition so that it can be excellent in 
detecting player from different source of the broadcast video. 
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Figure 10. Detected player from video 3 


Average Precision vs Trained Model Detector 
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Figure 11. Average Precision vs Trained Model Detector 


4. CONCLUSION AND FURTHER WORK 

As a summary, this study has been presented a new technique for automatic player detection from 
broadcast video via Faster R-CNN. By applying variation of badminton video including combination 
between single match and double match to the detector, it would be beneficial to coaches and sport analyst in 
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improving their player’s performance. Overall, most of the model may have excellent performance based on 
analyzing average precision from precision recall graph. Though, there also has several models might not be 
able to perform well in detecting the player due to some factors that has been discussed in Chapter 3. 
As stated above, there is no study regarding the automatic player detection for broadcast badminton video 
using deep learning. Thus, this approach has potential to detect players in other sport video or non-sport 
video as well. In future, it is advised that some improvement should be done for this system. The detector 
should be validated by training and testing with more diverse condition such as low quality and high 
definition video to ensure the system 1s totally intelligent enough to distinguish player from background. 
Furthermore, it is worthwhile when the computational time can be reduced since the proposed system require 
quite long training duration. Lastly, it is an opportunity for researchers to explore the appropriate technique to 
detect player from live video rather than using image frames. 
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